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(57) ABSTRACT 

A method and apparatus annotates a computer program to 
facilitate subsequent processing of the program. Code rep- 
resenting the program is generated at a first computer 
system. Annotations are generated for the code that provide 
information about the code. At a second computer, the code 
is processed according to the information provided by the 
annotations. The annotations, for example, can indicate a 
control flow graph representing a flow of execution of the 
code. Also, the information provided by the annotations can 
be a register allocation that maps data structures of the code 
to registers of the second computer system. The second 
computer system can use such information to guide the 
interpreting of the code or to transform the code into a more 
optimized form. Other exemplary annotations can indicate 
that running the executable form of the code would perform 
an unauthorized operation at the second computer system. 
The second computer system could then reject the code 
instead of performing subsequent processing on the code. 
When the source of the annotations is untrusted by the 
second computer system, the second computer system can 
use a checker to verify the integrity of the annotations. 

16 Claims, 3 Drawing Sheets 
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METHOD AND APPARATUS FOR 
ANNOTATING A COMPUTER PROGRAM TO 
FACILITATE SUBSEQUENT PROCESSING 
OF THE PROGRAM 

5 

FIELD OF THE INVENTION 

This invention relates generally to the processing of 
mobile, computer programs, and more particularly to anno- 
tating such programs to assist downstream processing 1Q 
phases. 

BACKGROUND 

In computer systems, and particularly in networked com- 
puter systems, computers commonly acquire programs to 15 
execute from other computers. Before executing an acquired 
program, the acquiring computer typically performs pro- 
cessing on the program. For example, the computer may 
compile the program into machine language native to that 
computer. As another example, the computer may verify that 20 
the program satisfies certain security constraints. This veri- 
fication is particularly important because, generally, the 
computer distrusts the acquired program; the security checks 
ensure that the program does not tamper with files and other 
resources of the computer. 25 

FIG. 1 illustrates a typical prior art network 100 in which 
a first computer 110 uses a program processing tool 112 to 
verify and compile a program downloaded from a second 
computer 120. The program downloaded from the second 
computer 120 is in an intermediate form 130 that represents 30 
the program. The second computer 120 used an intermediate 
code generator 150 to generate the intermediate form 130 
from source code 140 of the program. At the first computer 
110, the processing tool 112 analyzes the code 130 to 
determine whether the code 130 is safe to compile and 35 
execute. The tool 112 also performs code optimization 
techniques to produce executable machine code 160 native 
to the first computer 110. 

Security checks and compiler analyses consume system 
time and, as a result, can reduce performance. These analy- 40 
ses can also be ineffective because of insufficient informa- 
tion to perform a proper security check or insufficient time 
to thoroughly process available information. 

Security checks, for example, may err on the side of 45 
caution and reject secure code because the information 
necessary to prove that the code is secure is lacking. 
Moreover, a security check itself may be a source of 
vulnerability because it is incorrectly designed or improp- 
erly implemented. Unwittingly, this security check may 5Q 
leave open doors for attack. Also, some compilers, such as 
just-in-time compilers, may not have sufficient time to 
perform thorough analysis for optimization. Without enough 
time for optimization, the machine code may perform 
poorly. S5 

As a result, a need remains for a method and an apparatus 
that facilitate security checks and code analyses. Such a 
method and apparatus can lead to improved accuracy of the 
security checks and to machine code that performs better 
than what can currently be generated. 60 

SUMMARY 

In accordance with the present invention, an objective is 
to enhance program code, such as mobile code, with supple- 
mentary information that will help subsequent processing 65 
stages. Having such information available during subse- 
quent processing stages will, for example, lead to more 



,370 Bl 

2 

accurate determinations of the security of the code and to 
improved performances of generated machine code. 

A method performed according to the principles of the 
invention achieves the aforementioned and other objectives 
when processing intermediate code generated at a first 
computer system by generating annotations for the code. 
The annotations provide information about the intermediate 
code that can be used to process the code. A second 
computer system receiving the code and the annotations can 
then process the code according to the information provided 
by the annotations. 

The annotations, in general, provide information that is 
useful to the second computer system for processing the 
code. For example, the annotations can be a control flow 
graph that represents an execution flow of the code. Also, the 
annotations can provide a register allocation that maps the 
data structures of the code to machine registers of the second 
computer system. Other annotations can provide method 
offsets. Such information provided by the annotations can be 
useful to the second computer system, for example, when 
interpreting or compiling the code. As yet another example, 
the annotations can indicate whether running the code would 
perform unauthorized operations on the second computer 
system. 

These annotations can be generated at a number of 
locations in a network before being transmitted to the second 
computer system. For example, the first computer system 
that produced the code can also produce the annotations and 
send both the code and the annotations to the receiving 
computer system. The first computer system may produce 
the code and the annotations concurrendy or produce the 
annotations after the code has been generated. Also, the first 
computer system may add the annotations to the code and 
send both together to the second computer system, or store 
the annotations separately from the code and transmit the 
annotations and code separately. In still another example, a 
third computer system between the first and second com- 
puter systems, for example, a computer on a firewall pro- 
tecting the second computer system from receiving poten- 
tially harmful programs, can generate and transmit the 
annotations to the second computer system. 

Just as code from the first computer cannot always be 
trusted, downloaded annotations should also not be trusted 
unless a trusted system, such as the aforementioned third 
computer system on the firewall, generated the annotations. 
When the annotations come from an untrusted system, the 
second computer system must check the correctness of the 
annotations that the second computer system uses. Checking 
the analysis provided by the annotations, however, is often 
faster and simpler than performing the analysis, so the 
invention still improves the performance and reduces the 
vulnerability of the second computer system. 

In terms of the disclosed apparatus, the invention com- 
prises a first computer system and a second computer system 
coupled to each other by a network. The second computer 
system requests a computer program from the first computer 
system. An annotator generates an annotation for the pro- 
gram. The annotation provides information about the pro- 
gram that characterizes the program. The second computer 
system receives the code and the annotation and processes 
the code according to the information provided by the 
annotation. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram schematic of an embodiment of 
the present invention; 
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FIG. 2 is another more detailed block diagram schematic 
of an embodiment of the present invention; and 

FIG. 3 is a block diagram of an exemplary application of 
the present invention 

DETAILED DESCRIPTION OF THE 5 
PREFERRED EMBODIMENT 

FIG. 2 shows an exemplary networked computer system 
200 including a program annotator 220 coupled to a first 
computer 210 and a second computer 230. For purposes of 1Q 
illustration, the user of the second computer 230 does not 
trust the first computer 210 or any code coming from the first 
computer 210. This means that the user of the second 
computer 230 does not know whether an executable form of 
the code 212 will perform any unauthorized operations, such 15 
as accessing files and directories of the second computer 
230. Accordingly, the user of the code 212 should verify the 
integrity of untrusted code before executing it. The first 
computer 210 includes an intermediate code generator 214 
that converts source code 216 of a computer program into an 2Q 
intermediate code 212. The source code 216 can be written 
in any programming language, such as Java, C or C**, but 
the intermediate code generator 214 must be able to process 
the semantics and syntax of that programming language in 
order to produce the intermediate code 212. The intermedi- 25 
ate code 212 produced by the generator 214 is machine- 
independent, that is, the code 212 itself does not run on any 
particular computer without further processing, e.g., inter- 
preting or compiling. It is to be understood that the practice 
of the principles of the invention is not limited to interme- 3Q 
diate code, but rather that annotations can be generated for 
various other types of code, such as, for example, source 
code, machine code, machine-dependent or machine- 
independent code, high-level or low-level code, assembly 
code, etc. 35 

The annotator 220 includes an intermediate code analyzer 
222 that analyzes the intermediate code 212 from the first 
computer 210 and produces annotations 224 as a result. This 
analysis can include, for example, mapping variables to 
registers, determining a control flow of the code 212, ^ 
determining methods for optimizing the code 212, checking 
that all data structures are initialized and that the code 212 
is syntactically well-formed, contains valid references to 
data structures, data fields, and other code, and verifying that 
operations performed by the code 212 do not underflow or 45 
overflow the stack. These examples are simply illustrative. 

From the annotator 220, the intermediate code 212 and the 
annotations 224 pass to the second computer 230. Although 
shown in FIG. 1 to be separately forwarded to the second 
computer 230, the intermediate code analyzer 222 can 50 
annotate the code 212 so that the annotations 224 are placed 
in the code 212, producing an annotated intermediate code. 
As a result, the code 212 and the annotations 224 arrive 
concurrently at the second computer 230. 

Placing the annotations 224 in the code 212 displaces the 55 
need for locally caching the analysis. Before the present 
invention, each user of the intermediate code 212 would 
store the analysis performed on the code 212 for subsequent 
use. This way, the computer would not have to repeat the 
analysis each time the intermediate code 212 was down- eo 
loaded. With local caching, however, only the computer with 
the cached analysis benefited from that analysis. Using the 
present invention,, the analysis that is recorded by the 
annotations 224 in the intermediate code 212 can benefit any 
user with access to the annotated intermediate code. 65 

Hie annotator 220 can reside at the first computer 210 or 
at a third computer (not shown) connected to both the first 
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and second computers 210, 230. Conceivably, the annotator 
220 could reside at the second computer 230, but the benefits 
of annotating are greater when the intermediate code 212 
arrives at the second computer 230 already annotated. 

Normally, it would be easier to annotate the intermediate 
code 212 at the same computer where the intermediate code 
212 is produced because of the availability of the original 
source code 216. For example, when the annotator 224 
resides at the first computer system 210, the code and the 
annotations 224 can be produced concurrently, or the anno- 
tations 224 can be produced after the code has been gener- 
ated. Having the annotator 220 reside at the first computer 
210, therefore, produces advantages. On the other hand, the 
annotations 224 produced by the first computer 210 are 
untrusted because the first computer 210 is untrusted. Thus, 
the second computer 230 should verify the integrity of the 
annotations 224. 

For this purpose, the second computer 230 has a checker 
240 for verifying the integrity of the annotations 224. 
Because it is often faster and simpler to check annotations 
than to produce annotations, the advantages of annotating at 
an untrusted system remain. The checker 240 can immedi- 
ately reject the code 212 when the checker 240 determines 
that the annotations 224 are invalid. Invalid annotations 224 
include those annotations that present a false representation 
of the operation of the code 212 or perform operations that 
are unauthorized by the second computer 230 or are not 
well-formed, i.e., fail to follow a particular format. 
Conversely, valid annotations 224 are well-formed and 
accurately reflect the operation of the code 212. The checker 
240, then, can quickly conclude from the annotations 224 
whether the intermediate code 212 should be subsequently 
processed, e.g., interpreted or compiled. 

The dashed lines in FIG. 2 indicate that the second 
computer 230 may not need a checker 240 when the anno- 
tations 224 come from a trusted source. An example of a 
trusted source is a third computer (not shown) at a firewall 
between the first computer 210 and the second computer 
230, protecting the second computer 230 from harmful 
programs. The annotator 220 can reside at this third com- 
puter and produce annotations 224 that are trusted by the 
second computer 230. 

The second computer 230 includes a compiler 234 for 
transforming the intermediate code 212 into executable 
machine code 250. The machine code 250 is dependent on 
the microprocessor running the second computer 230. The 
compiler 234 has added capabilities for handling the format 
of the annotations 224 and for using the annotations 224 as 
guidance during construction of the machine code 250. For 
example, the additional capabilities of the compiler 234 
include analyzing the annotations 224 and rejecting the 
intermediate code 212 when the annotations 224 indicate 
that the code 212 is not secure. The compiler 234 can also 
reference the annotations 224 to optimize the machine code 
250. Alternatively, the second computer 230 can include an 
interpreter capable of using the annotations to determine 
whether to execute the intermediate code 212 and then for 
guidance during any subsequent code execution. 

ANNOTATIONS 

In general, the annotations 224 produced by the analyzer 
222 include any information about the code 212 that can be 
obtained from static analysis. This information facilitates 
subsequent processing of the code 212. The annotations 224 
that provide information about the code 212 are various and 
fall into at least two types: annotations that characterize 
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properties of the code; and annotations that are in the form 
of a formal proof of the code. This categorizing of annota- 
tions is not intended to be exhaustive, but rather to distin- 
guish annotations that characterize properties of the code 
from annotations that are a proof of the code. 5 

The first type of annotations 224, those that characterize 
properties of the code 212, provide the second computer 230 
with information that assists in a wide variety of subsequent 
processing of that code 212. Such subsequent processing 
includes determining whether the code is safe for additional 10 
subsequent processing, such as executing machine code, or 
interpreting or compiling intermediate code. For example, 
when the code 212 is in machine code form, this type of 
annotations 224 contains information about how the code 
accesses memory, allowing the second computer 230 to 15 
conclude that this machine code is safe to execute, or sucb 
annotations 224 can contain information about what regis- 
ters are live at different program points, allowing the code to 
be optimized for increased performance. Alternatively, when 
the code 212 is in an intermediate code form, the annotations 20 
can provide useful information for optimally interpreting the 
intermediate code or transforming the intermediate code into 
an executable form. 

The information provided by these annotations can range 
from a detailed description of a particular property of the 25 
code to a broad, overall perspective of the entire code 212. 
For instance, exemplary annotations can characterize the 
behavior of a single code statement, a block of code 
statements, or the flow of execution of the entire code 212. 
The following examples are illustrative of the diversity and 30 
uses of annotations that characterize properties of the code. 
Any one or all of these exemplary annotations may be 
generated for the code 212 and used by the second computer 
230 as aid in the subsequent processing of the code 212. 

Exemplary annotations 224 of the first type can indicate 
what variables are used in the code 212 and the types of 
values stored in those variables. The particular annotation 
for the code statement 

for example, can be 

{X t : integer, X 2 : undefined}, 

where X 2 and are the two variables used by the inter- 45 
mediate code 212. This particular exemplary annotation 
indicates that at this point in the code 212, the variable X 1 
holds a data structure of an integer type, while the type of the 
data structure in ^ is undefined. Such annotations 224, for 
example, can simplify and accelerate for the second com- so 
puter 230 the task of type-checking data structures of the 
code 212 to determine whether the intermediate code 212 is 
secure for subsequent execution. Thus, the second computer 
230 can determine beforehand that run-time checks of the 
intermediate code 212 are unnecessary. As another exem- 55 
plary use of such annotations, the information about the data 
types can assist run-time optimization by enabling tag-less 
garbage collection. 

Another exemplary annotation 224 is a control flow graph 
that represents the flow of execution of the entire code 212. 60 
Some exemplary annotations 224 can be less encompassing 
and represent the behavior of blocks of code statements. 
Such annotations 224 for blocks of statements can be placed 
at a block entry point, at an exit point, or at both points. 

Other annotations 224 can map data structures to machine 65 
registers of the second computer 230. The mapping of data 
structures to machine registers can help optimize machine 
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code 250 through efficient use of the machine registers. This 
register allocation can benefit just-in-time compilers that 
commonly make sub-optimal use of the registers because of 
the limited time in which to analyze intermediate code 212. 

Still other annotations 224 that characterize the code 212 
can provide method offsets. Method offsets direct the com- 
piler 234 to locations within an object where the compiler 
234 can find particular methods. These annotations can help 
the compiler 234 avoid clashes in method offsets in situa- 
tions of multiple inheritance. Still others 224 may show 
when a level of indirection can be removed from a data 
structure. 

Annotations of the second type provide a formal proof of 
some property of the code. The formal proof uses formal 
logic reasoning about the code. The proof assures that the 
code will behave according to a prescribed policy when that 
proof is validated. An example of the second type of 
annotations is described by George Necula in "Proof- 
Carrying Code", 1997, incorporated by reference herein. 
There, a compiler adds a formal proof to native binary code 
while the compiler produces the binary code. When the 
proof is validated, the binary code is deemed safe to execute. 

Annotations of the second type can be used to practice the 
principles of the present invention. A proof provided by such 
annotations can be used to determine whether code should 
be subsequently executed, i.e., compiled or interpreted. 
When the proof is validated, annotations of the previously- 
mentioned first type can then be used to guide such subse- 
quent execution. In general, to produce annotations, the 
analyzer 222 statically analyzes the intermediate code 212 
like a conventional compiler. Off-loading the analyses to the 
analyzer 222 allows the second computer 230 to more 
quickly and more effectively process the intermediate code 
(e.g., produce better machine code 250) than if the second 
computer 230 had to perform its own analyses. This is 
because the annotator 220 may have more time than the 
second computer 230 to produce a more thorough analysis. 
Also, the annotator 220 may have access to available source 
code 216, whereas such information may not be available to 
the second computer 230. 

FIG. 3 illustrates an exemplary application using the 
principles of the present invention to process a computer 
program. A communication network 300 connects a server 
302 in the network 300 with a client computer 304 by 
network link 306. An example of such a network 300 is the 
Internet. The server 302 supports a web page; that is, the 
server 302 maintains documents, pages and other forms of 
data for retrieval. Applets, which are small programs com- 
piled to an intermediate form, might be attached to the web 
page when the web page is retrieved. 

The server 302 includes an annotator 303 that statically 
analyzes and annotates, according to the principles of the 
invention, each applet attached to the web page. That the 
annotator 303 statically analyzes the applet before the applet 
is sent to the client 304 distinguishes the present invention 
from known techniques, such as a Java™ virtual machine, 
that analyze the applet at the client 304. 

The client 304 includes memory 310 for storing a browser 
312, an annotation checker 314, and a compiler 316. The 
memory 310 can include a hard disk and main memory. The 
browser 312 provides a user of the client 306 with an 
interface that simplifies access to the server 302. Examples 
of a browser are Netscape Communicator™ and Microsoft 
Internet Explorers™. 

During an execution of the browser 312, the client 304 
can request access to the web page on the server 302. The 
browser 312 issues the request to the server by the link 306. 
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In response to the request, the server 302 returns the data 
associated with the requested web page to the client 304. 
When the retrieved web page is accompanied by an attached 
applet, the server 302 sends the annotated intermediate code 
representing the applet to the client 304. 5 

When the server 302 is trusted by the client 304, the client 
304 can process the annotated intermediate code according 
to the annotations embedded in the code without having to 
verify the annotations. This processing can include checking 
the safety of the applet and executing the applet. As used in iQ 
this context, "executing" means interpreting or compiling. 
For example, the annotations may provide typing of the 
variables in the code from which the browser can determine 
whether the applet is safe to execute on the client 304. As 
another example, the annotations can suggest register 
allocations, for example, that help the browser execute the 15 
applet through efficient use of machine registers of the client 
304. 

Typically, however, the client 304 does not trust the applet 
produced by the server 302. In this event, the client 304 
would analyze the annotations along with the applet to make 20 
sure that the applet would not perform any unwanted opera- 
tions when the applet runs on the client 304. The checker 
314 accordingly verifies the integrity of the annotations in 
the applet code. The browser 312 rejects the applet when the 
checker 314 determines that the annotations are false. On the 2 s 
other hand, when the checker 314 determines that the 
annotations are valid, the browser 312 can process the 
applet, as previously noted, according to the annotations in 
the applet code. 

Although described within the context of the Internet and 30 
web browsers, the invention can be practiced within any 
other context where programs are annotated to facilitate 
subsequent program processing stages. The foregoing 
description has been directed to specific embodiments of this 
invention. It will be apparent, however, that variations and 35 
modifications may be made to the described embodiments, 
with the attainment of all or some of the advantages. It is the 
object of the appended claims, therefore, to cover all such 
variations and modifications as come within the spirit and 
scope 01 the invention. ^ 

What is claimed is: 

1. A computerized method for processing code represent- 
ing a computer program, the code being generated at a first 
computer system, the method comprising the steps of: 

generating an annotation for the code that characterizes at 45 
least one property of the code; 

analyzing the annotation, at a second computer system, to 
determine whether the code can safely operate at the 
second computer system and to provide information for 
optimizing the code's operating performance; and 50 

transforming and optimizing the code into an executable 
code in the second computer according to the informa- 
tion contained in the annotation if the analysis indicates 
that the code can be safely operated. 

2. The method of claim 1 further comprises interpreting 55 
the code according to the information provided by the 
annotation. 

3. The method of claim 1 wherein the annotation includes 
information on register allocation that maps data structures 

of the code to registers of the second computer system. 60 

4. The method of claim 1 wherein the annotation includes 
information on a control flow graph representing a flow of 
execution of the code. 

5. The method of claim 1 wherein the annotation includes 
information on a method offset. 65 

6. The method of claim 1 wherein the annotation indicates 
data types of variables in the code. 
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7. The method of claim 1, further comprising the step of: 
verifying at the second computer system that the annota- 
tion is valid. 

8. The method of claim 1 wherein the generating of the 
annotation occurs at the first computer system. 

9. The method of claim 1 wherein the generating of the 
annotation occurs at a third computer system. 

10. The method of claim 1, further comprising the step of: 
adding the annotation to the code to produce annotated 

code; and 

sending the annotated code to the second computer sys- 
tem. 

11. The method of claim 1 wherein the code is interme- 
diate code requiring processing before the code can operate 
at the second computer system. 

12. The method of claim 1, further comprising the steps 
of determining from the information provided by the anno- 
tations whether the code can be trusted to operate at the 
second computer system and operating the code only if the 
code can be trusted. 

13. The method of claim 1 wherein the code is trusted to 
operate at the second computer system when the annotations 
are generated at a trusted computer system. 

14. The method of claim 1 wherein determining whether 
the code should be processed includes determining whether 
running an executable form of the code would perform an 
unauthorized operation at the second computer system. 

15. An apparatus for processing a computer program, 
comprising: 

a first computer system and a second computer system 
coupled to each other by a network, the second com- 
puter system requesting a computer program from the 
first computer system; 

an annotator, coupled to receive the program, generating 
an annotation for the program, the annotation charac- 
terizing at least e property of the program; and 

the second computer receiving the code and the 
annotation, the second computer analyzing the annota- 
tion to determine whether the code can safely operate 
at the second computer system and provide information 
for optimizing the code's operating performance, and if 
the analysis indicates that the code can be safely 
operated, the second computer system transforming 
and optimizing the code into an executable code in the 
second computer according to the information con- 
tained in the annotation. 

16. A system for processing a computer program, the 
system comprising: 

a first computer system and a second computer system 
coupled to each other by a network, the first computer 
system comprising a means for generating code; 

means for generating an annotation for the code, the 
annotation providing information that characterizes at 
least one property of the code; 

means for analyzing the annotation, at the second com- 
puter system, to determine whether the code can safely 
operate at the second computer system and to provide 
information for optimizing the code's operating per- 
formance; and 

means for transforming and optimizing the code into an 
executable code in the second computer according to 
the information contained in the annotation if the 
analysis indicates that the code can be safely operated. 

***** 
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